Structure-Based Chemical Shift Prediction Using Random Forests Non-Linear Regression

نویسندگان

  • K. Arun
  • Christopher James Langmead
چکیده

Protein nuclear magnetic resonance (NMR) chemical shifts are among the most accurately measurable spectroscopic parameters and are closely correlated to protein structure because of their dependence on the local electronic environment. The precise nature of this correlation remains largely unknown. Accurate prediction of chemical shifts from existing structures’ atomic co-ordinates will permit close study of this relationship. This paper presents a novel non-linear regression based approach to chemical shift prediction from protein structure. The regression model employed combines quantum, classical and empirical variables and provides statistically significant improved prediction accuracy over existing chemical shift predictors, across protein backbone atom types. The results presented here were obtained using the Random Forest regression algorithm on a protein entry data set derived from the RefDB re-referenced chemical shift database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Red Mud Bound-Soda Losses in Bayer Process Using Neural Networks

In the Bayer process, the reaction of silica in bauxite with caustic soda causes the loss of great amount of NaOH. In this research, the bound-soda losses in Bayer process solid residue (red mud) are predicted using intelligent techniques. This method, based on the application of regression and artificial neural networks (AAN), has been used to predict red mud bound-soda losses in Iran Alumina C...

متن کامل

Regression Trees and Random forest based feature selection for malaria risk exposure prediction

This paper deals with prediction of anopheles number, the main vector of malaria risk, using environmental and climate variables. The variables selection is based on an automatic machine learning method using regression trees, and random forests combined with stratified two levels cross validation. The minimum threshold of variables importance is accessed using the quadratic distance of variabl...

متن کامل

Random forests for survival analysis using maximally selected rank statistics

The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption is not always fulfilled. An alternative approach is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistics, which favors splitting variables with many possible sp...

متن کامل

Prediction of toxicity of aliphatic carboxylic acids using adaptive neuro-fuzzy inference system

Toxicity of 38 aliphatic carboxylic acids was studied using non-linear quantitative structure-toxicityrelationship (QSTR) models. The adaptive neuro-fuzzy inference system (ANFIS) was used to construct thenonlinear QSTR models in all stages of study. Two ANFIS models were developed based upon differentsubsets of descriptors. The first one used log ow K and LUMO E as inputs and had good predicti...

متن کامل

Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent err...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006